NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Recent Event Camera Innovations: A Survey

https://doi.org/10.1007/978-3-031-92460-6_21

Chakravarthi, Bharatesh; Verma, Aayush Atul; Daniilidis, Kostas; Fermuller, Cornelia; Yang, Yezhou (January 2025, Springer Nature Switzerland)

Full Text Available
FisherRF: Active View Selection and Mapping with Radiance Fields Using Fisher Information

Jiang, Wen; Lei, Boshu; Daniilidis, Kostas (October 2024, ECCV)

Full Text Available
FisherRF: Active View Selection and Mapping with Radiance Fields Using Fisher Information

Jiang, Wen; Lei, Boshu; Daniilidis, Kostas (October 2024, Springer)

This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the cost of acquiring images poses the need to select the most informative viewpoints efficiently. Existing approaches depend on modifying the model architecture or hypothetical perturbation field to indirectly approximate the model uncertainty. However, selecting views from indirect approximation does not guarantee optimal information gain for the model. By leveraging Fisher Information, we directly quantify observed information on the parameters of Radiance Fields and select candidate views by maximizing the Expected Information Gain (EIG). Our method achieves state-of-the-art results on multiple tasks, including view selection, active mapping, and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields.
more » « less
Full Text Available
Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

https://doi.org/10.1109/IROS58592.2024.10802849

Wu, Bo; Lee, Bruce D; Daniilidis, Kostas; Bucher, Bernadette; Matni, Nikolai (October 2024, IEEE)

Full Text Available
Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

https://doi.org/10.1109/ICRA55743.2025.11127233

Strong, Matthew; Lei, Boshu; Swann, Aiden; Jiang, Wen; Daniilidis, Kostas; Kennedy, Monroe (May 2025, IEEE)

We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at arm.stanford.edu/next-best-sense.
more » « less
Free, publicly-accessible full text available May 19, 2026
NAP: Neural 3D Articulated Object Prior

Lei, Jiahui; Deng, Congyue; Shen, Bokui; Guibas, Leonidas; Daniilidis, Kostas (September 2023, Neurips - Openreview)

Full Text Available
Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance

Deng, Congyue; Lei, Jiahui; Shen, Bokui; Daniilidis, Kostas; Guibas, Leonidas (September 2023, Neurips - Openreview)

Full Text Available
Multi-view Tracking, Re-ID, and Social Network Analysis of a Flock of Visually Similar Birds in an Outdoor Aviary

https://doi.org/10.1007/s11263-023-01768-z

Xiao, Shiting; Wang, Yufu; Perkes, Ammon; Pfrommer, Bernd; Schmidt, Marc; Daniilidis, Kostas; Badger, Marc (June 2023, International Journal of Computer Vision)

Full Text Available
Uncertainty-driven Planner for Exploration and Navigation

https://doi.org/10.1109/ICRA46639.2022.9812423

Georgakis, Georgios; Bucher, Bernadette; Arapin, Anton; Schmeckpeper, Karl; Matni, Nikolai; Daniilidis, Kostas (May 2022, 2022 International Conference on Robotics and Automation (ICRA))

Full Text Available
Cross-modal Map Learning for Vision and Language Navigation

https://doi.org/10.1109/CVPR52688.2022.01502

Georgakis, Georgios; Schmeckpeper, Karl; Wanchoo, Karan; Dan, Soham; Miltsakaki, Eleni; Roth, Dan; Daniilidis, Kostas (June 2022, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

We consider the problem of Vision-and-Language Navigation (VLN). The majority of current methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or using cross-modal attention over the egocentric observations of the agent. In contrast to other works, our key insight is that the association between language and vision is stronger when it occurs in explicit spatial representations. In this work, we propose a cross-modal map learning model for vision-and-language navigation that first learns to predict the top-down semantics on an egocentric map for both observed and unobserved regions, and then predicts a path towards the goal as a set of way-points. In both cases, the prediction is informed by the language through cross-modal attention mechanisms. We experimentally test the basic hypothesis that language-driven navigation can be solved given a map, and then show competitive results on the full VLN-CE benchmark.
more » « less
Full Text Available

« Prev Next »

Search for: All records